suppressPackageStartupMessages({
library('tidyverse')
library('plotly')
library('treeio')
library('ggtree')
library('ggtreeExtra')
library("RColorBrewer")
library("cowplot")
library("shadowtext")})
Creates supplementary table 1, with BGCs as observations
Rscript notebook/bgc_table.R \
--antismash_dir ~/wwtphqmags/antismash/6.0.1/ \
--bigscape_dir ~/wwtphqmags/bigscape/wwtphqmags_antismash_6.0.1/network_files/2022-02-10_10-03-49_glocal_wwtphqmags_antismash_6.0.1/ \
--output tables/wwtphqmags_bgcs.csv
## # A tibble: 4,242 × 9
## bgc_id GCF genome_id contig start end product contig_edge class
## <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <lgl> <chr>
## 1 CP064957.1.re… 1911 GCA_0166… CP064… 2.67e5 3.09e5 NRPS-l… FALSE NRPS
## 2 CP064958.1.re… 1912 GCA_0166… CP064… 2.35e5 2.56e5 CDPS FALSE Othe…
## 3 CP064960.1.re… 1913 GCA_0166… CP064… 1.93e4 4.01e4 hserla… FALSE Othe…
## 4 CP064960.1.re… 1914 GCA_0166… CP064… 1.86e5 2.07e5 terpene FALSE Terp…
## 5 CP064963.1.re… 1915 GCA_0166… CP064… 1.30e5 1.73e5 NRPS-l… FALSE NRPS
## 6 CP064963.1.re… 1916 GCA_0166… CP064… 2.67e5 2.76e5 RiPP-l… FALSE RiPPs
## 7 CP064963.1.re… 1917 GCA_0166… CP064… 1.53e6 1.61e6 hglE-K… FALSE Othe…
## 8 CP064963.1.re… 1918 GCA_0166… CP064… 3.85e6 3.86e6 RiPP-l… FALSE RiPPs
## 9 CP064963.1.re… 1919 GCA_0166… CP064… 4.36e6 4.38e6 RRE-co… FALSE RiPPs
## 10 CP064964.1.re… 1920 GCA_0166… CP064… 2.23e4 4.32e4 terpene FALSE Terp…
## # … with 4,232 more rows
Creates supplementary table 2, with MAGs as observations
Rscript notebook/genome_table.R \
--bgcs_table tables/wwtphqmags_bgcs.csv \
--supplementary_file data/singleton_2021_table3.xlsx \
--assembly_details data/assembly_details.txt \
--output tables/wwtphqmags_genomes.csv
## # A tibble: 1,080 × 11
## genome_id total_bgcs bgcs_on_contig_edge gtdb_taxonomy assembly_level
## <chr> <dbl> <dbl> <chr> <chr>
## 1 GCA_016699045.1 1 0 d__Bacteria;p_… Complete/Chro…
## 2 GCA_016705575.1 5 0 d__Bacteria;p_… Contig/Scaffo…
## 3 GCA_016705605.1 2 0 d__Bacteria;p_… Contig/Scaffo…
## 4 GCA_016705565.1 2 0 d__Bacteria;p_… Contig/Scaffo…
## 5 GCA_016705545.1 4 0 d__Bacteria;p_… Contig/Scaffo…
## 6 GCA_016705525.1 5 1 d__Bacteria;p_… Contig/Scaffo…
## 7 GCA_016705495.1 1 0 d__Bacteria;p_… Contig/Scaffo…
## 8 GCA_016705475.1 6 0 d__Bacteria;p_… Contig/Scaffo…
## 9 GCA_016705465.1 3 1 d__Bacteria;p_… Contig/Scaffo…
## 10 GCA_016705435.1 2 0 d__Bacteria;p_… Contig/Scaffo…
## # … with 1,070 more rows, and 6 more variables: checkm_completeness <dbl>,
## # checkm_contamination <dbl>, genome_size <dbl>, ncbi_bioproject <chr>,
## # mimag_quality <chr>, source <chr>
Histogram
New classes and color palettes
Barplot
Boxplots
Genome size scatter plots
Pearson correlation of complete dataset
cor(x = scatter_ds$genome_size, y = scatter_ds$total_bgcs)
## [1] 0.6175123
Pearson correlation without the three phyla highlighted above
without_ds <- scatter_ds[scatter_ds$phylum == "Other", ]
cor(x = without_ds$genome_size, y = without_ds$total_bgcs)
## [1] 0.346717
Tree of the 1080 genomes
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
Boxplots for relevant genera